Data Mining as a Method for Linguistic Analysis: Dutch Diminutives*

نویسندگان

  • Walter Daelemans
  • Steven Gillis
چکیده

We propose to use data mining techniques (inductive techniques for the automatic acquisition of comprehensible knowledge from data) as a method in linguistic analysis. In the past, such techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks. In this paper we show that they can also assist in linguistic theory formation by providing a new tool for the evaluation of linguistic hypotheses, for the extraction of rules from corpora, and for the discovery of useful linguistic categories. By applying a rule induction method to a particular linguistic task (diminutive formation in Dutch) we show that data mining techniques can be used to test linguistic hypotheses about this morphological proces, and to discover interesting morphological and phonological rules and categories. * Preparation of this paper was supported by a Research Grant of the Fund for Joint Basic Research (FKFO 2.0101.94) of the National Fund for Scientific Research (NFWO) and by a VNC project of NFWO NWO (contract number G.2201.96), and a grant from the Research Council of the University of Antwerp. Linguistics as Data Mining 2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diminutives facilitate word segmentation in natural speech: cross-linguistic evidence.

Final-syllable invariance is characteristic of diminutives (e.g., doggie), which are a pervasive feature of the child-directed speech registers of many languages. Invariance in word endings has been shown to facilitate word segmentation (Kempe, Brooks, & Gillis, 2005) in an incidental-learning paradigm in which synthesized Dutch pseudonouns were used. To broaden the cross-linguistic evidence fo...

متن کامل

DOMAIN DATABASE KNOWLEDGE Incompleteness

There are several diierent ways data mining (the automatic induction of knowledge from data) can be applied to the problem of natural language processing. In the past, data mining techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks. In this paper, we show that they can also assist in linguistic theory formation by providing a new to...

متن کامل

DOMAIN DATABASE KNOWLEDGE Incompleteness Noise

There are several di erent ways data mining the automatic induction of knowledge from data can be applied to the problem of natural language processing In the past data mining techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks In this paper we show that they can also assist in linguistic theory formation by providing a new tool for...

متن کامل

Belgian Dutch versus Netherlandic Dutch: New patterns of divergence? On pronouns of address and diminutives

The linguistic climate in northern Belgium (Flanders) has been changing in recent years. A new corpus of spoken Dutch meets the need for data reflecting actual and present-day language use in this part of the Dutch language area. The ‘Spoken Dutch Corpus’ allows us to uncover and analyse the present state of colloquial Belgian Dutch and the changes which mark this condition. This paper discusse...

متن کامل

Diminutives in child-directed speech supplement metric with distributional word segmentation cues.

In two experiments, we explored whether diminutives (e.g., birdie, Patty, bootie), which are characteristic of child-directed speech in many languages, aid word segmentation by regularizing stress patterns and word endings. In an implicit learning task, adult native speakers of English were exposed to a continuous stream of synthesized Dutch nonsense input comprising 300 randomized repetitions ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997